Filing Date: August 18, 2003

Title: LATENCY TOLERANT DISTRIBUTED SHARED MEMORY MULTIPROCESSOR COMPUTER

Page 11 Dkt: 1376.700US1

### REMARKS

This responds to the Office Action mailed on January 25, 2007.

No claims are amended, no claims are canceled, and no claims are added; as a result, claims 1, 3-8 and 11-18 are now pending in this application.

## *In the Drawings*

Fig. 9 has been amended, and new Figs. 11 and 12 have been added.

### §112 Rejection of the Claims

Claims 1, 3-8, and 11-18 were rejected under 35 U.S.C. § 112, first paragraph, as lacking adequate description or enablement. Specifically, the Office Action states that amended additional limitations regarding "a translation look-aside buffer (TLB)" and "physical page numbers" lack support by the written description (Office Action, p. 6, lines 3-15).

As noted at p. 7, lines 8-10 of Applicant's Specification, the teachings of Assignee's other application, "Remote Translation Mechanism for a Multi-node System" (U.S. Application No. 10/235,898, now U.S. Patent No. 6,922,766), is incorporated by reference into Applicant's Specification. Using the TLB for source translation of memory requests from a local node is fully described, for example, at p. 8, line 6 through p. 9, line 23 of 10/235,898. Applicant has amended the Specification to more clearly incorporate the relevant portions of 10/235,898.

Regarding the physical page numbers, as noted at p. 6, lines 24-29 of Applicant's Specification, Applicant clearly teaches that "RTT contains translation information for an entire virtual memory address space associated with the remote node." It is inherent from the teaching that the RTT has capacity to store all physical page numbers associated with the remote node (i.e., processing node).

For these reasons as discussed above, the amended additional limitations regarding "a translation look-aside buffer (TLB)" and "physical page numbers" are fully supported by Applicant's Specifications and/or Drawings. Reconsideration is respectfully requested.

Page 12 Dkt: 1376.700US1

Serial Number: 10/643,585 Filing Date: August 18, 2003

Title: LATENCY TOLERANT DISTRIBUTED SHARED MEMORY MULTIPROCESSOR COMPUTER

# §103 Rejection of the Claims

Claims 1, 3, 5-8 and 11-18 are rejected under 35 U.S.C. § 103(a) as being unpatentable over Scott et al. (US 6,925,547: hereinafter 'Scott') in view of Fossum et al. (US 4,888,679: hereinafter 'Fossum').

Applicant respectfully submits that neither Scott nor Fossum, alone or in combination, teach or suggest a multiprocessor computer system having latency tolerant and scalable distributed shared memory as taught by Applicant and claimed in claims 1, 3, 5-8 and 11-18.

Scott describes a method for performing remote address translation in a multiprocessor system (Abstract, lines 1-2). Specifically, Scott describes an address translation method where a virtual address for a remote node is sent to the remote node and translated into a physical address at the remote node using a translation look-aside buffer (TLB) (*id.* lines 2-19).

Fossum describes a method and apparatus using a cache and main memory for both vector processing and scalar processing units (Abstract). Specifically, Fossum describes a scalar processor sending a vector load command to a vector processor, and also sending a vector prefetch request to the cache in response to a vector load instruction (col. 3, lines 17-20).

Neither Scott nor Fossum, however, alone or in combination, teach or suggest using a TLB to translate memory requests from a local node while using a RTT to translate memory requests from a remote node as taught by Applicant and claimed in claims 1, 3, 5-8 and 11-18.

First of all, the Office Action states that:

(Office Action, p. 4, lines 12-17) From the description [page 6, lines 19-30 and page 7, lines 1-8 of Applicant's Specification], it appears that the [Applicant's] address transaction scheme would use the local translation means (i.e., the RTT located at the local processor/node) to translate the address if the virtual address corresponds to a local address space, and would use the remote translation means (i.e., the RTT located at the remote processor/node) to translate the address if the virtual address corresponds to a remote address space.

The Office Action also states that col. 17, lines 35-45, col. 2, lines 65-67, and col. 3, lines 1-23 of Scott shows using the same RTT to translate memory requests not only from a remote node but also from a local node (Office Action, p. 5, lines 1-10). Based on this, the Office Action further states that there is no difference between Applicant's translation scheme and Scott's approach. *Id*.

Serial Number: 10/643,585

Filing Date: August 18, 2003

LATENCY TOLERANT DISTRIBUTED SHARED MEMORY MULTIPROCESSOR COMPUTER Title:

As noted in the discussion of rejections under § 112 above, however, Applicant clearly teaches and claims using the TLB to translate memory references from a local processing node while using the RTT to translate memory references from another (i.e., remote) processing node. Unlike the assertion in the Office Action, Applicant's RTT is used only to translate virtual address references to a local node when that memory reference is coming from a remote node. A separate TLB associated with a local processor is used to translate virtual address references to the local node by the local processor. Under Applicant's claimed invention, therefore, accesses from remote nodes do not have to fight local processes for space in the TLB. This reduces churning in the TLB.

In contrast to Applicant's separate translation approach for local and remote references, under Scott's approach, the TLB in a system HUB (SHUB) is used for both local and remote address references. Scott states that:

(col. 14, lines 9-13) If the connection endpoint is the local node, then the address translation uses a [external] TLB on the local SHUB. However, if the connection endpoint is a remote node, the address translation uses a [external] TLB on the remote node.

(col. 14, lines 47-52) The address translation mechanism used by CE 64 uses an external TLB located on the local SHUB, or an external TLB located on a remote SHUB, but does not use the TLBs which are used by the processors themselves to perform translation. Thus, the TLBs used by CE 64 may be referred to as "external" TLBs since they are external to the processors.

(col. 18, line 65 through col. 19, line 4) Thus, the address translation mechanism may be used to perform both local and remote address translations, with the [external] TLB on the local SHUB used for translating a virtual address if a CD indicates that the local node is the connection endpoint, and the [external] TLB on a remote SHUB used for translating the virtual address if a CD indicates that the remote node is the endpoint.

Although Scott discloses that the processor's TLB is separate from the [external] TLB in the SHUB, the TLBs associated with the processors do not participate in translating memory references to the local node by the processors. Instead, as quoted above and admitted in the Office Action (p. 5, lines 4-8), the separate [external] TLB in the local SHUB translates memory references by the local processors to the local node as well as memory references to the local node received from other remote nodes. This is a different approach from Applicant's invention claimed in claims 1, 8, 12 and 17.

Serial Number: 10/643,585 Filing Date: August 18, 2003

Title: LATENCY TOLERANT DISTRIBUTED SHARED MEMORY MULTIPROCESSOR COMPUTER

For the reasons discussed above, neither Scott nor Fossum, alone or in combination, teach or suggest a multiprocessor computer system having latency tolerant and yet scalable shared memory as taught by Applicant and claimed in claims 1, 3, 5-8 and 11-18. Reconsideration is respectfully requested.

With regard to claims 3, 11, 13 and 18, claims 3, 11, 13 and 18 are patentable as depending on a patentable base claim. In addition, neither Scott nor Fossum, alone or in combination, teach or suggest the shared memory having a plurality of cache coherence directories as taught by Applicant and claimed in claims 3, 11, 13 and 18.

The Office Action states that Scott discloses the same limitation (p. 7, lines 9-15). As support of this, the Office Action points to col. 5, lines 47-67 of Scott, which partly states:

...In one embodiments, all of the coherence information is passed across the bus in the form of messages, and each processor on the bus "snoops" by monitoring the addresses on the bus and, if it finds the address of data within its own cache, invalidating that cache entry. Other cache coherence schemes can be used as well...

Applicant respectfully disagrees. Although the cited portion discloses use of a cache coherence method, the portion does not teach or suggest the specific way of **using cache** coherence directories located in the shared memory to maintain cache coherence as described and claimed by Applicant. Applicant is unable to find such a teaching in any of the references considered in the Office Action. Reconsideration is respectfully requested.

With regard to claims 6 and 15, claims 6 and 15 are patentable as being dependent on a patentable base claim. In addition, neither Scott nor Fossum, alone or in combination, teach or suggest using a scalar processing unit having a scalar cache memory with a subset of cache lines stored in a processor cache as taught by Applicant and claimed in claims 6 and 15.

The Office Action (p. 7, line 20 through p. 8, line 2) states that Fossum teaches the same limitation. As support of this, the Office Action points to Fig. 1 (CACHE 24) and col. 4, lines 15-54 of Fossum.

Applicant respectfully disagrees. Although the cited portions show use of a cache associated with both a scalar processor (21) and a vector processor (22), the portions do not show using an additional cache dedicated to the scalar processor as taught by Applicant and claimed in

Serial Number: 10/643,585

Filing Date: August 18, 2003

Title: LATENCY TOLERANT DISTRIBUTED SHARED MEMORY MULTIPROCESSOR COMPUTER

Page 15 Dkt: 1376.700US1

claims 6 and 15. Furthermore, Fossum does not teach or suggest the specific way of connecting the scalar cache memory (920) to the cache (120) by having the scalar cache memory (920) contain only a subset of cache lines stored in the cache (120) as taught by Applicant and claimed in claims 6 and 15. Applicant is unable to find such a teaching in any of the references considered in the Office Action. Reconsideration is respectfully requested.

Claim 4 was rejected under 35 U.S.C. § 103(a) as being unpatentable over Scott et al. in view of Fossum et al., and further in view of Nakazato (US 6,782,468).

With regard to claim 4, claim 4 is patentable as being dependent on a patentable base claim. Reconsideration of claims 1, 3-8 and 11-18 is respectfully requested.

## **CONCLUSION**

Applicant respectfully submits that the claims are in condition for allowance and notification to that effect is earnestly requested. The Examiner is invited to telephone Applicant's attorney (612) 373-6909 to facilitate prosecution of this application.

If necessary, please charge any additional fees or credit overpayment to Deposit Account No. 19-0743.

### **Reservation of Rights**

In the interest of clarity and brevity, Applicant may not have addressed every assertion made in the Office Action. Applicant's silence regarding any such assertion does not constitute any admission or acquiescence. Applicant reserves all rights not exercised in connection with this response, such as the right to challenge or rebut any tacit or explicit characterization of any reference or of any of the present claims, the right to challenge or rebut any asserted factual or legal basis of any of the rejections, the right to swear behind any cited reference such as provided under 37 C.F.R. § 1.131 or otherwise, or the right to assert co-ownership of any cited reference. Applicant does not admit that any of the cited references or any other references of record are relevant to the present claims, or that they constitute prior art. To the extent that any rejection or assertion is based upon the Examiner's personal knowledge, rather than any objective evidence of record as manifested by a cited prior art reference, Applicant timely objects to such reliance

Serial Number: 10/643,585 Filing Date: August 18, 2003

Title: LATENCY TOLERANT DISTRIBUTED SHARED MEMORY MULTIPROCESSOR COMPUTER

on Official Notice, and reserves all rights to request that the Examiner provide a reference or affidavit in support of such assertion, as required by MPEP § 2144.03. Applicant reserves all rights to pursue any cancelled claims in a subsequent patent application claiming the benefit of

priority of the present patent application, and to request rejoinder of any withdrawn claim, as

required by MPEP § 821.04.

Respectfully submitted,

STEVEN L. SCOTT

By his Representatives,

SCHWEGMAN, LUNDBERG, WOESSNER & KLUTH, P.A.

P.O. Box 2938

Minneapolis, MN 55402

(612) 373-6909

Date March 22, 2007

Thomas F. Brennan Reg. No. 35,075

CERTIFICATE UNDER 37 CFR 1.8: The undersigned hereby certifies that this correspondence is being filed using the USPTO's electronic filing system EFS-Web, and is addressed to: Commissioner of Patents, P.O. Box 1450, Alexandria, VA 22313-1450 on this 22 day of March

| CANDIS BUENDING | Guli Brend |  |
|-----------------|------------|--|
| Name            | Signature  |  |